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Towards Ideal Semantics for Analyzing Stream Reasoning^ 
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Abstract. The rise of smart applications has drawn interest to logi¬ 
cal reasoning over data streams. Recently, different query languages 
and stream processing/reasoning engines were proposed in different 
communities. However, due to a lack of theoretical foundations, the 
expressivity and semantics of these diverse approaches are given only 
informally. Towards clear specifications and means for analytic study, 
a formal framework is needed to define their semantics in precise 
terms. To this end, we present a first step towards an ideal semantics 
that allows for exact descriptions and comparisons of stream reasoning 
systems. 

1 Introduction 

The emergence of sensors, networks, and mobile devices has gener¬ 
ated a trend towards pushing rather than pulling of data in information 
processing. In the setting of stream processing JT) studied by the 
database community, input tuples dynamically arrive at the processing 
systems in form of possibly infinite streams. To deal with unbound¬ 
edness of data, such systems typically apply window operators to 
obtain snapshots of recent data. The user then runs continuous queries 
which are either periodically driven by time or eagerly driven by the 
arrival of new input. The Continuous Query Language (CQL) (3] is a 
well-known stream processing language. It has a syntax close to SQL 
and a clear operational semantics. 

Recently, the rise of smart applications such as smart cities, smart 
home, smart grid, etc., has raised interest in the topic of stream rea¬ 
soning (H), i.e., logical reasoning on streaming data. To illustrate 
our contributions on this topic, we use an example from the public 
transport domain. 

Example 1 To monitor a city’s public transportation, the city traffic 
center receives sensor data at every stop regarding tram/bus appear¬ 
ances of the form tr[X, P) and bus{X, P) where X, P hold the 
tram/bus and stop identifiers, respectively. On top of this streaming 
data tuples (or atoms), one may ask different queries, e.g., to monitor 
the status of the public transport system. To keep things simple, we 
start with stream processing queries: 

(qi) At stop P, did a tram and a bus arrive within the last 5 min? 
( 52 ) At stop P, did a tram and a bus arrive at the same time within 
the last 5 min? 

Consider the scenario of Fig. which depicts arrival times of trams 
and buses. The answer to query ( 92 ) is yes for stop p 2 and all time 
points from 2 to 7. Query (q\) also succeeds for pi from 11 to 13. 

As for stream reasoning, later we will additionally consider a more 
involved query, where we are interested in whether a bus always 
arrived within three minutes after the last two arrivals of trams. ■ 
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Figure 1. Traffic scenario with arrivals of trams and buses 

Different communities have contributed to different aspects of this 
topic, (i) The Semantic Web community extends SPARQL to allow 
querying on streams of RDF triples. Engines such as CQELS M 
and C-SPARQL (5) also follow the snapshot semantics approach 
of CQL. (ii) In Knowledge Representation and Reasoning (KRR), 
first attempts towards expressive stream reasoning have been carried 
out by considering continuous data in Answer Set Programming 
(ASP) (9] HD or extending Datalog to sequential logic programs [m. 
However, the state of the art in either field has several shortcomings. 

Approaches in (|^ face difficulties with extensions of the formalism 
to incorporate the Closed World Assumption, nonmonotonicity, or 
non-determinism. Such features are important to deal with missing 
of incomplete data, which can temporarily happen due to unstable 
network connections or hardware failure. In this case, engines like 
C-SPARQL and CQELS remain idle, while some output based on 
default reasoning might be useful. Moreover, e.g., in the use case 
of dynamic planning on live data, multiple plans shall be generated 
based on previous choices and the availability of new data. This is not 
possible with current deterministic approaches. 

On the other hand, advanced reasoning has extensively been in¬ 
vestigated in but traditionally only on static data. First attempts 
towards stream reasoning reveal many problems to solve. The plain 
approach of (9) periodically calls the dlvhex solver Go) but is not 
capable of incremental reasoning and thus fails under heavy load 
of data. StreamLog flT) is an extension of Datalog towards stream 
reasoning. It always computes a single model and does not consider 
windows. Time-decaying logic programs CD attempt to implement 
time-based windows in reactive ASP m but the relation to other 
stream processing/reasoning approaches has not yet been explored. 

Moreover, as observed in jS], conceptually identical queries may 
produce different results in different engines. While such deviations 
may occur due to differences (i.e., flaws) in implementations of a com¬ 
mon semantics, they might also arise from (correct implementations 
of) different semantics. For a user it is important to know the exact 
capabilities and the semantic behavior of a given approach. However, 
there is a lack of theoretical underpinning or a formal framework for 
stream reasoning that allows to capture different (intended) seman¬ 
tics in precise terms. Investigations of specific languages, as well as 
comparisons between different approaches, are confined to experi¬ 
mental analysis |B1, or informal examination on specific examples. A 
systematic investigation, however, requires a formalism to rigorously 
describe the expressivity and the properties of a language. 




Contributions. We present a first step towards a. formal framework 
for stream reasoning that (i) provides a common ground to express 
concepts from different stream processing/reasoning formalisms and 
engines; (ii) allows systematic analysis and comparison between ex¬ 
isting stream processing/reasoning semantics; and (iii) also provides a 
basis for extension towards more expressive stream reasoning. More¬ 
over, we present (iv) exemplary formalizations based on a running 
example, and (v) compare our approach to existing work. 

Thereby, we aim at capturing idealized stream reasoning semantics 
where no information is dropped and semantics are characterized as 
providing an abstract view over the entire stream. Second, we idealize 
with respect to implementations and do not consider processing time, 
delays or outages in the semantics itself. Moreover, we allow for a 
high degree of expressivity regarding time reference: We distinguish 
notions of truth of a formula (i) at specific time points, (ii) some time 
point within a window, or (iii) all time points in a window. Moreover, 
we allow (iv) for nested window operators, which provide a means to 
reason over streams within the language itself (a formal counterpart 
to repeated runs of continuous queries). 

2 Streams 

In this section, we introduce a logic-oriented view of streams and 
formally define generalized versions of prominent window functions. 

2.1 Streaming Data 

A stream is usually seen as a sequence, set or bag of tuples with a 
timestamp. Here, we view streams as functions from a discrete time 
domain to sets of logical atoms and assume no fixed schema for tuples. 

We build upon mutually disjoint sets of predicates V, con¬ 
stants C, variables V and time variables W. The set T of 
terms is given by C U V and the set A of atoms is defined as 
... ,tn) \ p & V, tiy ■ ■ An & T}. The Set Q of ground atoms 
contains all atoms p{ti,... ,tn) € A such that {fi,..., f„} C C. 
If i, j G N, the set [i, y] = {A: G N | i < k < j} is called an interval. 

Definition 1 (Stream) Let T be an interval and ti: N —>■ 2 ® an in¬ 
terpretation function such thatv{t) = iii for all t G N \ T. Then, the 
pair S = (T, v) is called a stream, and T is called the timeline of S. 

The elements of a timeline are called time points or timestamps. A 
stream S' = (T', v') is a substream or window of stream S = (T, v), 
denoted S' C 5, if T' C T and v'(t') C v{t') for all t' G T'. 
The cardinality of S, denoted H^S, is defined by EtgT|v(f)|. The 
restriction of S to T' C T, denoted 5 |t/, is the stream {T' 
where v\ti is the usual domain restriction of function v. 

Example 2 (cont’d) The input for the scenario in Example[2can be 
modeled as a stream S = (T, v) where T = [0,13] and 

v{2) = {tr{a,pi),bus{c,pi)} w(ll) = {bus{e,p2)} 

v{8) = {tr{d,p2)} v(t) = 0 otherwise. 

The interpretation v can be equally represented as the following set: 
{2i-^{tr{a,pi),bus{c,pi)},8i-^{tr{d,p2)}, {bus{e,p2)}} ■ 

2.2 Windows 

An essential aspect of stream reasoning is to limit the considered data 
to so-called windows, i.e., recent substreams, in order to limit the 
amount of data and forget outdated information. 
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Figure 2. Time-based window with range (2,1) and step size 3 

Definition 2 (Window function) A window function maps from a 
stream S = {T, v) and a time point t £ T to a window S' C S. 

The usual time-based window of size £ n contains only the tuples 
of the last i time units. We give a generalized definition where the 
window can also include the tuples of the future u time points. Based 
on query time t and a step size d, we derive a pivot point t' from which 
an interval [ti, t„] is selected by looking backward (resp., forward) I 
(resp., u) time units from t' , i.e., + £ = t' and t' + u = tu. 

Definition 3 (Time-based window) Let S = {T, v) be a stream 
with timeline T = \tmin,tmax\, let t € T, and let d,£,u £ N such 
that d< £ -\- u. The time-based window with range {£, u) and step 
size d of 5 at time t is defined by 

w^f-iS,t) = (T',v\T>), 

where T' = tt, = m.a,yi{tmin,t' — £} with t' = [|J • d, 

and tu = min{f' -|- u, tmax}. 

For time-based windows that target only the past £ time points, we 
abbreviate with For windows which target only the future, 
we write for If the step size d is omitted, we take d = 1. 
Thus, the standard sliding window with range £ is denoted by w^. 

The CQL syntax for is [Range 1 Slide d] and 
corresponds to [Range 1]. Moreover, the window [Now] 
equals [Range 0] and thus corresponds to ui°. The entire past 
stream, selected by [Range Unbounded] in CQL, is obtained 
by w', where t is the query time. To consider the entire stream (in¬ 
cluding the future), we can use w™, where n = maxT. 

Furthermore, we obtain tumbling windows by setting d = £ + u. 

Example 3 (cont’d) To formulate the monitoring over the stream S 
of Example]^ one can use a time-based window w® with a range of 5 
minutes (to the past) and step size of 1 minute, i.e., the granularity 
of T. The results of applying this window function at f = 5,11 are 

w®(S', 5) = ([0, 5], {2 i-t- {tr{a,pi), 6 us(c,pi)}}), and 
w^{S, 11 ) = ([ 6 , 11 ], {8 !->■ {tr{d,p 2 )}, 11 { 6 us(e,p 2 )}}). 

Moreover, consider a time-based (tumbling) window with 
range (2,1) and step size 3. For ti = 5, we have t'l = [|J -3 = 3, 
thus Tl = [max{0, 3 — 2}, min{3 -I- 1,13}] = [1, 4]. For t 2 = 11, 
we get t '2 = 9 and T 2 = [7,10]. The windows for f = 5,11 are 

W 3 ’^(S, 5 ) = ([1,4],{2 i-s- {fr(a, pi), 6 us(c,pi)}}), and 
W 3 ’^(S', 11) = ([7,10], {8 i-d- {tr{d,p 2 ))}}). 

Figure|^illustrates the progression of this window with time. ■ 
















The goal of the standard tuple-based window with count n is to 
fetch the most recent n tuples. Again, we give a more general def¬ 
inition which may consider future tuples. That is, relative to a time 
point f £ T = [train, tmax], we Want to obtain the most recent I tu¬ 
ples (of the past) and next u tuples in the future. Thus, we must 
return the stream restricted to the smallest interval T' = [ti, C T, 
where ti < t < tu, such that S contains I tuples in the interval \ti, t] 
and u tuples in the interval [f -|- l,tu]- In general, we have to discard 
tuples arbitrarily at time points te and tu in order to receive exactly t 
and u tuples, respectively. In extreme cases, where fewer than i tuples 
exist in [tmin,t], respectively fewer than u tuples in [t -|- 1, tmax], we 
return all tuples of the according intervals. Given t £ T and the tuple 
counts £,« € N, we define the tuple time bounds ti and as 

t^ = max {train} U {t \ tmin ^ t' <t/\ nS\[t',t])>i}, and 

tu = min {fmax} U {f'l f-l-1 < f' < fmax A # (S'] p + l, j/] ) > m} . 

Definition 4 (Tuple-based window) Let S = (T, v) be a stream 
and t £ T. Moreover, let f, u £ N, = \tt, t] and Tu = [t+l,tu], 
where ti and tu are the tuple time bounds. The tuple-based window 
with counts {I, u) of S at time t is defined by 
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where Xq C v{tq), q £ {t, u}, such that fi^(Tq,v'\Tq) = q. 

Note that the tuple-based window is unique only if for 
both q £ {I, m}, v'{tq) = v{tq), i.e., if all atoms at the endpoints 
of the selected interval are retained. There are two natural possibilities 
to enforce the uniqueness of a tuple-based window. First, if there 
is a total order over all atoms, one can give a deterministic defini¬ 
tion of the sets Xq in Def.|^ Second, one may omit the requirement 
that exactly i tuples of the past, resp. u tuples of the future are con¬ 
tained in the window, but instead demand the substream obtained 
by the smallest interval \tt, tu] containing at least I past and u fu¬ 
ture tuples. Note that this approach would simplify the definition 
to f) = {T' requiring only to select T' — \ti,tu\. 

We abbreviate the usual tuple-based window operator which 

looks only into the past, by . Similarly, stands for 

Example 4 (cont’d) To get the last 3 appearances of trams or 
buses from stream S in Example at time point 11, we can 
apply a tuple-based window with counts (3,0). The applica¬ 
tion 11) can lead to two possible windows (T',v{) 

and {T' , V 2 ), where T' — [2,11], and 

= {2 i-> {fr(a,pi)},8 !->• {tr{d,p 2 )}, 11 M> {bus{e,p 2 )}}, 

V 2 = {bus{c,pi)}, 8 !->• {tr{d,p 2 )}, 11 i-S’ {bus{e,p 2 )}}. 

The two interpretations differ at time point 2, where either tr{a,pi) 
or bus{c,pi) is picked to complete the collection of 3 tuples. ■ 

The CQL syntax for the tuple-based window is [Rows n], which 
corresponds to Note that in CQL a single stream contains tuples 
of a fixed schema. In the logic-oriented view, this would translate to 
having only one predicate. Thus, applying a tuple-based window on a 
stream in our sense would amount to counting tuples across different 


streams. To enable counting of different predicates in separation, we 
introduce a general form of partition-based windows. 

The partition-based window CQL applies a tuple-based window 
function on substreams which are determined by a sequence of at¬ 
tributes. The syntax [Partition By Al, . . . , Ak Rows N] 
means that tuples are grouped into substreams by identical val¬ 
ues ai,..., flfc of attributes Al,..., Ak. From each substream, the N 
tuples with the largest timestamps are returned. 

Here, we have no notion of attributes. Instead, we employ 
a general total index function Idx : Q ^ I from ground atoms 
to a finite index set I C N, where for each i £ I we obtain 
from a stream S = {T,v) a substream idxi(S') = {T,Vi) by tak¬ 
ing Vi{t) = {a £ v{t) I idx(a) = i}. Moreover, we allow for indi¬ 
vidual tuple counts n{i) = {£i,Ui) for each substream Si. 

Definition 5 (Partition-based window) Let S = (T, v) be a 

stream, idx :(/—>/ C N, an index function, and for all i £ I 
let n{i) = (£i, m) £ N x N and Si = idxi(S). Moreover, lett£T 
and {Si,f) = ([ff, f“], w') be the tuple-based window of 

counts {£i,Ui) of Si at time t. Then, the partition-based window of 
counts {{£i, Ui)}i^I of S time t relative to idx is defined by 

'w'(^"{S,t) = {T',v'), where T' = [rninff, rnaxf"], 
andv'{t') — Vi{t') for all t' £ T'. 

Note that, in contrast to schema-based streaming approaches, we have 
multiple kinds of tuples (predicates) in one stream. Whereas other 
approaches may use tuple-based windows of different counts on sepa¬ 
rate streams, we can have separate tuple-counts on the corresponding 
substreams of a partition-based window on a single stream. 

Example 5 (cont’d) Suppose we are interested in the arrival times 
of the last 2 trams, but we are not interested in buses. To this end, we 
construct a partition-based window as follows. We use index 
set / = {1, 2}, and idx(p(X, y)) = 1 iff p = tr. For the counts in 
the tuple-based windows of the substreams, we use n(l) = (2,0) 
and n(2) = (0, 0). We obtain the substreams 

51 = ([2,13], {2 !->• {tr{a,pi)},8 i->- {fr(d,p 2 )}}), and 

5 2 = ([2,13], {2 I-)- {6ms(c,Pi)}, 11 i-> {bus{e,p2)}}), 

and the respective tuple-based windows 

13) = ([2,13], {2 i-^{tr{a,pi)}, 8 i->-{fr(d,p 2 )}}), and 
w’^°(S2,13) = ([13,13], 0). 

Consequently, we get w^"{S, 13) = ([2,13], v'), where v' is 

{2v^ {tr{a,pi)},8v^ {tr{d,p 2 )}. ■ 

3 Reasoning over Streams 

We are now going to utilize the above definitions of streams and 
windows to formalize a semantics for stream reasoning. 

3.1 Stream Semantics 

Towards rich expressiveness, we provide different means to relate 
logical truth to time. Similarly as in modal logic, we will use opera¬ 
tors □ and O to test whether a tuple (atom) or a formula holds all the 
time, respectively sometime in a window. Moreover, we use an exact 
operator @ to refer to specific time points. To obtain a window of the 
stream, we employ window operators EB;. 



Definition 6 (Formulas J-k) The set Th of formulas (for k modali¬ 
ties) is defined by the grammar 

a ::= a \ —la \ a /\ a \ oi\/ a \ oi ^ a \ Oa | Da | @tct \ EBiCr 
where a is any atom in A, i £ {1, • ■ • k}, and f G N UW. 

We say a formula a is ground, if all its atoms are ground and for all 
occurrences of form @t/3 in q it holds that f € N. In the following 
semantics definition, we will consider the input stream (urstream) 
which remains unchanged, as well as dynamic suhstreams thereof 
which are obtained hy (possibly nested) applications of window func¬ 
tions. To this end, we define a stream choice to be a function that 
returns a stream based on two input streams.Two straightforward 
stream choices are chi, for i £ {1,2}, defined by chi{Si, S 2 ) = Si. 
Given a stream choice ch, we obtain for any window function w an 
extended window function why w[Si, S 2 ,t) = w{ch{Si, S 2 ),t) for 
all f G N. We say w is the extension ofw (due to ch). 


Definition 
0{t) = t 

0{m) = t(u) 

0(c) = c 

0 ( 1 ;) = ct{v) 

0(p(ll, ... ,t„)) = 
p(0(tl),... ,0(I„)) 
0{ab/3)) = 0(a) b 0(/3) 
0(ua) = u0(o) 

0(@i,o) =@t©(o) 

0(o[tt]) = 0(a)[0(ti)] 


Scope 

time points t G N 
time variables u £U 
constants c £C 
variables v £V 

predicates p £ V and terms ti £'T 

b G {A, V,^} 
u G O, □} U (EBijigN 
@„q; t = 0(tt) 
queries «[«] 


.-Table L Definition of substitution 0 based on query assignment (cr, t) 


We are now going to define the semantics of queries over streams. 


Definition 9 (Query) Let S = (T, v) be a stream, u £ T UU and 
let a be a formula. Then a[u] denotes a query (on S). Hfe say a query 
is ground, if a is ground and u £ T, else non-ground. 


Definition 7 (Structure) Let Sm = {T, v) be a stream, I CN a 
finite index set and let W be a function that maps every i £ I to 
an extended window function. The triple M = (T, v, W) is called a 
structure and Sm is called the urstream of M. 

We now define when a ground formula holds in a structure. 

Definition 8 (Entailment) Let M = (T, v, W) be a structure. For 
a substream S = (Ts,Vs) of {T,v), we define the entailment 
relation Ih between (M, S, t) and formulas. Let t £T, a £ Q, 
and a,j3 £ (Fk be ground formulas and let Wi = W{i). Then, 


For the evaluation of a ground query a[f\ we will use M, SM,t IF a. 
To define the semantics of non-ground queries, we need the notions 
of assignments and substitution. A variable assignment a is a map¬ 
ping V ^ C from variables to constants. A time variable assign¬ 
ment r is a mapping W —>■ N from time variables to time points. The 
pair [a, t) is called a query assignment. Table[T]defines the substitu¬ 
tion 0 based on query assignment (a, r), where a,P £ Fk. 

Let q = q[m] be a query on 5 = (T, v). We say a substitution 0 
grounds q, if 0{q) is ground, i.e., if 0 maps all variables and time 
variables occurring in q. If, in addition, t{x) £ T for every time 
variable x £U occurring in q, we say 0 is compatible with q. 


M, S', t IF a 

iff 

a £ vs{t), 

M, S,t\h^a 

iff 

M,S,tF a. 

M,S,t\\- a /\ P 

iff 

M, S, f IF a and M, S, t IF /?, 

M,S,t\hay P 

iff 

M,S,t\\- a or M,S,t\\- P, 

M,S,t\ha^ P 

iff 

M,S,t¥ a or M, S, t IF P, 

M, S, t IF Oa 

iff 

M, S, t' IF a for some t' £ Ts, 

M, S,t IF Oa 

iff 

M, S, t' IF a for all t' £ Ts , 

M, S,t IF ©pa 

iff 

M, S, t' IF a and t' £ Ts , 

M, S, t IF EBiCr 

iff 

M,S',t\\-a where S' = wfiSM, S,t) 


If M, S,t IF a holds, we say (M, S, f) entails a. Intuitively, M 
contains the urstream Sm which remains unchanged and S is the 
currently considered window. An application of a window operator EBi 
utilizes the extended window W(i) which can take into account both 
the urstream Sm and the current window S to obtain a new view, 
as we will discuss later. The operators O and □ are used to evaluate 
whether a formula holds at some time point, respectively at all time 
points in the timeline Ts of S. The operator @t allows to evaluate 
whether a formula holds at a specific time point t'mTs. 

Example 6 (cont’d) Let M = {T,v,W), where Sm = {T,v) is 
the stream S from Example]^ and W(l) — ih®, i.e., the extension 
of w® of Example|^due to c/i 2 . Consider the following formula: 

a = ffli(Ofr(d,p 2 ) A Obus{e,p 2 )) 

We verify that M, Sm,!! IF a holds. Eirst, the window opera¬ 
tor EBi selects the substream S' — (Ts',v'), where Ts' = [6,11] 
and v' = v\t' = {8 i-a- {tr{d,p 2 )}, 11 i-A- {bus{e,p 2 )}}. Next, to 
see that (M, S', 11) entails Otr{d,p 2 ) A <>bus{e,p 2 ), we have to 
find time points in the timeline Ts' of the current window S', such 
that tr{d,p 2 ) and bus{e,p 2 ) hold, respectively. Indeed, for 8 and 11, 
we have M, Si, 8 IF tr{d,p 2 ) and M, Si, 11 IF bus(e,p 2 ). ■ 


Definition 10 (Answer) The answer ?q to a query q — a\t] on S is 
defined as follows. If q is ground, then 1q= yes if M, SM,t IF q 
holds, and ?g = no otherwise. If q is non-ground, then 

7q = {(cr, r) j 0 is compatible with q and ?0(q) = yes}. 

That is, the answer to a non-ground query is the set of query substitu¬ 
tions such that the obtained ground queries hold. 

Example 7 (cont’d) We formalize the queries of Ex.[T]as follows: 

qi = EBi(Ofr(X, P) A Obus{Y, P))[u] 
q 2 = aiO{tr{X, P) A bus{Y, P))[u] 

The query q = EBiO(fr(a,pi) A 6Ms(c,pi))[f] is ground iff t G N 
and ?q = yes iff f G [2, 7]. We evaluate qi on structure M of Ex.|^ 

M, Sm, f IF EBi(Ofr(a,pi) A 06iis(c,pi)) for all fG[2,7] 

M, Sm, f IF EBi(Ofr(ci,p 2 ) A 06its(e,p2)) for all f G [11,13] 

Thus, the following set of substitutions is the answer to qi in M: 

Tqi = {({-Ai->-a, Yv^c, Pi-^pi},{ui-^t}) \ t £ ]2, 7]} U 

{({-A I— t d, Y I— y e, F’l— i P 2 }, {111 —^ f})|f^[H,l^]} ■ 

Exact time reference. With the operator @t we can ask whether a 
formula holds at a specific time point t. In its non-ground version, we 
can utilize this operator for the selection of time points. 

Example 8 (cont’d) Let a = tram(X, P) A busiY, P). Eor each 
of the queries @[7a[13] and a\U], the time assignments for U in 
the answers will map to time points when a tram and a bus arrived 
simultaneously at the same stop. In both cases, the single answer 
is {{X I— >■ a, y I— >■ c, P I—pi}, {U I— >■ 2}). Note that omitting @u 
in the first query would give an empty answer, since the subformula a 
does not hold at time point 13. ■ 


We observe that the operator @ allows to replay a historic query. At 
any time t' > t, we can ask to simulate a previous query a\t]. 

Nested windows. Typically, window functions are used exclusively 
to restrict the processing of streams to a recent subset of the input. In 
our view, window functions provide a flexible means to reason over 
temporally local contexts within larger windows. For these nested 
windows we carry both M and S for the entailment relation. 

Example 9 (cont’d) Consider the following additional query (qs): 
At which stops P, for the last 2 two trams X, did a bus Y arrive 
within 3 minutes? To answer (ga) at time point 13, we ask 

ga = mia{tr{X,P) ffl2 06Ms(y, P))[13]. 

For EBi, we can use the extension of the partition-based win¬ 
dow of Example]^ Applying 1^(1) on the stream S = {T,v) 
in the previous examples yields S' = (T', v'), where T' = [2,13] 
and v' = {2 {tr(a,pi)}, 8 i— {tr(d,p 2 )}}. That is, after apply¬ 
ing this window, the current window S' no longer contains informa¬ 
tion on buses. Consequently, to check whether a bus came in both 
cases within 3 minutes, we must use the urstream Sm- Thus, the 
second extended window W{2) = is the extension of the time- 
based window which looks 3 minutes into the future, due to the 
stream choice chi. Flence, will create a window based on Sm 
and not on S'. The two time points in T' where a tram appears are 2 
and 8, with P matching pi andp 2 , respectively. Applying W(2) there 
yields the streams S 2 = {T 2 ', v' 2 ) and S'g = (Tg', v'g), where 

T 2 ' = [2,5], V 2 = {2 I-)- {tr(a,pi), bus(c,pi)}}, and 

'Tg = [8,11], v'g = { 81 -^ {tr(d,p 2 )}, 11 !-)■ {bus(e,p 2 )}}. 

In both streams, we find a time point with an atom bus{Y,pj) 
with the same stop pj as the tram. Thus, in both cases 
the subformula Obus{Y,P) is satisfied and so the implica¬ 
tion tr{X, P) —^ S 20 bus{Y, P) holds at every point in time of the 
stream selected by EBi. Hence, the answer to the query is 

?g3 = {{{X^a,Y^c,P^pi}, 0)}, 

{(X I—>-d, Xi—>'e,T’i—>-p2})0)}}- ■ 


4 Discussion and Related Work 

In this section we discuss the relationship of this ongoing work with 
existing approaches from different communities. 

Modal logic. The presented formalism employs operators O and □ as 
in modal logic Also, the definition of entailment uses a structure 
similar to Kripke models for multi-modal logics. However, instead 
of static accessibility relations, we use window functions which take 
into account not only the worlds (i.e., the time points) but also the 
interpretation function. To our hest knowledge, window operators 
have heen considered neither in modal logics nor temporal logics. 
CQL. By extending SQL to deal with input streams, CQL queries 
are evaluated based on three sets of operators: 

(i) Stream-to-relation operators apply window functions to the 
input stream to create a mapping from execution times to bags 
of valid tuples (w.r.t. the window) without timestamps. This 
mapping is called a relation. 

(ii) Relation-to-relation operators allow for modification of relations 
similarly as in relational algebra, respectively SQL. 


(hi) Relation-to-stream operators convert relations to streams by 
directly associating the timestamp of the execution with each 
tuple (RStream). The other operators IStream/DStream, which 
report inserted/deleted tuples, are derived from RStream. 

The proposed semantics has means to capture these operators: 

(i) The window operators EBi keep the timestamps of the selected 
atoms, whereas the stream-to-relation operator discards them. 
The CQL query for tuple x thus corresponds to a query Ox of 
the present setting. A stream in CQL belongs to a fixed schema. 
As noted earlier, this corresponds to the special case with only 
one predicate. CQL’s partition-based window is a generalization 
of the tuple-based window defined there. In turn, the presented 
partition-based window generalizes the one of CQL. 

(ii) Some relational operators can essentially be captured by logical 
connectives, e.g., the join by conjunction. Some operators like 
projection will require an extension of the formalism towards 
rules. Moreover, we did not consider arithmetic operators and 
aggregation functions, which CQL inherits from SQL. 

(iii) The answer to a non-ground query q[m] is a set of query assign¬ 
ments (cr, r). To capture the RStream of CQL, we can group 
these assignments by the time variable u. 

Example 10 Queries (gi) and (g 2 ) from Example[^can be expressed 
in CQL. We assume that both streams have the attributes X and P, cor¬ 
responding to the first, respectively second argument of predicates tr 
and bus. For (gi), we can use: 

SELECT * FROM tr [RANGE 5], bus [RANGE 5] 

WHERE tr.P = bus.P 

On the other hand, (g 2 ) needs two CQL queries. 

SELECT * AS tr_bus FROM tr [NOW], bus [NOW] 

WHERE tr.P = bus.P 

SELECT * FROM tr_bus [RANGE 5] 

Here, the first query produces a new stream that contains only simul¬ 
taneous tuples and the second one covers the range of 5 minutes. ■ 

Traditionally, stream reasoning approaches use continuous queries, 
i.e., repeated runs of queries with snapshot semantics to deal with 
changing information. In this work, we go a step further and en¬ 
able reasoning over streams within the formalism itself by means of 
nested windows. One can only mimic this feature with CQL’s snap¬ 
shot semantics when timestamps are part of the schema and explicitly 
encoded. Likewise, queries to future time points can be emulated in 
this way, as the next example shows. 

Example 11 (cont’d) In Example we considered bus arrivals 
within 3 minutes after the last 2 trams. In CQL, such a query is 
not possible on the assumed schema. However, by adding a third 
attribute TS that carries the timestamps to the schema, the following 
CQL query yields the same results. 

SELECT * FROM tr [ROWS 2], 

bus [RANGE UNBOUNDED] 

WHERE tr.P = bus.P AND bus.TS - tr.TS <= 3 

Note that we need no partition-based window here, since trams and 
buses arrive from different input streams. Moreover, we must use the 
unbounded window for buses to cover nesting of windows in (gs) 
because windows in CQL are applied at query time and not the time 
where a tram appearance is notified. ■ 


Furthermore, nested CQL queries and aggregation inherited from SQL 
are promising to mimic the behavior of operator □. With according 
rewriting, CQL eingines like STREAM O could be used to realize 
the proposed semantics. 

SECRET. In (3 a model called SECRET is proposed to analyze the 
execution behavior of different stream processing engines (SPEs) from 
a practical point of view. The authors found that even the outcome 
of identical, simple queries vary significantly due to the different 
underlying processing models. There, the focus is on understanding, 
comparing and predicting the behaviour of engines. In contrast, we 
want to provide means that allow for a similar analytical study for 
the semantics of stream reasoning formalisms and engines. The two 
approaches are thus orthogonal and can be used together to compare 
stream reasoning engines based on different input feeding modes as 
well as different reasoning expressiveness. 

Reactive ASP. The most recent work related to expressive stream rea¬ 
soning with rules HD is based on Reactive ASP im. This setting in¬ 
troduces logic programs that extend over time. Such programs have the 
following components. Two components P and Q are parametrized 
with a natural number t for time points. In addition, a basic compo¬ 
nent B encodes background knowledge that is not time-dependent. 
Moreover, sequences of pairs of arbitrary logic programs {Ei, Fj ), 
called online progression are used. While P and Ei capture accu¬ 
mulated knowledge, Q and Fj are only valid at specific time points. 
Compared to reactive ASP, our semantics has no mechanism for ac¬ 
cumulating programs, and we take only streams of atoms/facts, but 
no background theories. Therefore, a framework based on idealized 
semantics with extension to rules should be able to capture a fragment 
of reactive ASP where P and Fj are empty and Ei contains only facts. 
The foreseeable conversion can be as follows: convert rules in Q by 
applying an unbounded window on all body atoms of a rule, using @t 
to query the truth value of the atoms at time point t. Then, conclude 
the head to be true at t and feed facts from Ei to the input stream S. 

StreamLog. Another logic-based approach towards stream reasoning 
is StreamLog E). It makes use of Datalog and introduces temporal 
predicates whose first arguments are timestamps. By introducing se¬ 
quential programs which have syntactical restrictions on temporal 
rules, StreamLog defines non-blocking negation (for which Closed 
World Assumption can be safely applied) that can be used in recursive 
rules in a stream setting. Since sequential programs are locally strat¬ 
ified, they have efficiently computable perfect (i.e., unique) models. 
Similar to capturing a fragment of Reactive ASP, we can capture 
StreamLog by converting temporal atoms pit, xi,..., Xn) to expres¬ 
sions @tp(xi, ... ,x„) and employing safety conditions to rules to 
simulate non-blocking negation. Moreover, we plan for having weaker 
notions of negation that might block rules but just for a bounded num¬ 
ber of time points to the future. 

ETALIS. The ETALIS system (2 aims at adding expressiveness to 
Complex Event Processing (CEP). It provides a rule-based language 
for pattern matching over event streams with declarative monotonic 
semantics. Simultaneous events are not allowed and windows are not 
regarded as first-class objects in the semantics, but they are available 
at the system implementation level. Tuple-based windows are also not 
directly supported. Furthermore, nesting of windows is not possible 
within the language, but it can be emulated with multiple rules as in 
CQL. On the other hand, ETALIS models complex events with time 
intervals and has operators to express temporal relationships between 
events. 


5 Conclusion 

We presented a first step towards a theoretical foundation for (ide¬ 
alistic) semantics of stream reasoning formalisms. Analytical tools 
to characterize, study and compare logical aspects of stream engines 
have been missing. To fill this gap, we provide a framework to reason 
over streaming data with a fine-grained control over relating the truth 
of tuples with their occurrences in time. It thus, e.g., allows to capture 
various kinds of window applications on data streams. We discussed 
the relationship of the proposed formalism with exsisting approaches, 
namely CQL, SECRET, Reactive ASP, StreamLog, and ETALIS. 

Next steps include extensions of the framework to formally capture 
fragments of existing approaches. Towards more advanced reasoning 
features like recursion and non-monotonicity, we aim at a rule-based 
semantics on top of the presented core. Furthermore, considering inter¬ 
vals of time as references is an interesting research issue. To improve 
practicality (as a tool for formal and experimental analysis) one might 
also develop an operational characterization of the framework. In a 
longer perspective, along the same lines with m, we aim at a formal¬ 
ism for stream reasoning in distributed settings across heterogeneous 
nodes having potentially different logical capabilities. 
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